Search Result

Select

Regional bullying recognition based on joint hierarchical attentional network and independent recurrent neural network

MENG Zhao, TIAN Shengwei, YU Long, WANG Ruijin

Journal of Computer Applications 2019, 39 (8): 2450-2455. DOI: 10.11772/j.issn.1001-9081.2019010033

Abstract （510）

PDF （983KB）（280）

Save

In order to improve the utilization efficiency of deep information in text context, based on Hierarchical Attention Network (HAN) and Independent Recurrent Neural Network (IndRNN), a regional bullying semantic recognition model called HACBI (HAN_CNN_BiLSTM_IndRNN) was proposed. Firstly, the manually annotated regional bullying texts were mapped into a low-dimensional vector space by means of word embedding technology. Secondly, the local and global semantic information of bullying texts was extracted by using Convolutional Neural Network (CNN) and Bidirectional Long Short-Term Memory (BiLSTM), and internal structure information of text was captured by HAN. Finally, in order to avoid the loss of text hierarchy information and solve the gradient disappearance problem, IndRNN was introduced to enhance the description ability of model, which achieved the integration of information flow. Experimental results show that the Accuracy (Acc), Precision (P), Recall (R), F1 (F1-Measure) and AUC (Area Under Curve) values are 99.57%, 98.54%, 99.02%, 98.78% and 99.35% respectively of this model, which indicates that the effectiveness provided by HACBI is significantly improved compared to text classification models such as Support Vector Machine (SVM) and CNN.

Reference | Related Articles | Metrics

Select

Android malware detection based on texture fingerprint and malware activity vector space

LUO Shiqi, TIAN Shengwei, YU Long, YU Jiong, SUN Hua

Journal of Computer Applications 2018, 38 (4): 1058-1063. DOI: 10.11772/j.issn.1001-9081.2017102499

Abstract （467）

PDF （862KB）（401）

Save

To improve the accuracy and automation of malware recognition, an Android malware analysis and detection method based on deep learning was proposed. Firstly, the malware texture fingerprint was proposed to reflect the content similarity of malicious code binary files, and 33 types of malware activity vector space were selected to reflect the potential dynamic activities of malicious code. In addition, to improve the accuracy of the classification, the AutoEncoder (AE) and the Softmax classifier were trained combined with the above characteristics. Test results on different data samples showed that the average classification accuracy of the proposed method was up to 94.9% by using Stacked AE (SAE), which is 1.1 percentage points higher than that of Support Vector Machine (SVM). The proposed method can effectively improve the accuracy of malicious code recognition.

Reference | Related Articles | Metrics

Select

Water body extraction method based on stacked autoencoder

WANG Zhiyin, YU Long, TIAN Shengwei, QIAN Yurong, DING Jianli, YANG Liu

Journal of Computer Applications 2015, 35 (9): 2706-2709. DOI: 10.11772/j.issn.1001-9081.2015.09.2706

Abstract （501）

PDF （619KB）（13070）

Save

To improve the accuracy and automation of extracting water body by using remote sensing image, a method was proposed for water body extraction based on Stacked AutoEncoder (SAE). A deep network model was built by stacking sparse autoencoders and each layer was trained in turn with the greedy layerwise approach. Features were learnt without supervision from the pixel level to avoid the problem that methods such as traditional neural network needed artificial feature analysis and selection. Softmax classifier was trained with supervision by using the learnt features and corresponding labels. Back Propagation (BP) algorithm was used to fine-tune and optimize the whole model. The accuracy of SAE-based method reaches 94.73% by using the Tarim River's ETM+ data to do the experiment, which is 3.28% and 4.04% higher than that of Support Vector Machine (SVM) and BP neural network separately. The experimental results show that the proposed method can effectively improve the accuracy of water body extraction.

Reference | Related Articles | Metrics

Select

Distributed massive molecule retrieval model based on consistent Hash

SUN Xia, YU Long, TIAN Shengwei, YAN Yilin, LIN Jiangli

Journal of Computer Applications 2015, 35 (4): 956-959. DOI: 10.11772/j.issn.1001-9081.2015.04.0956

Abstract （536）

PDF （581KB）（530）

Save

In view of the problems that the traditional general graph matching search is inefficient, and refractive index data cannot be positioned fast in large data environment, a distributed massive molecular retrieval model based on consistent Hash function was established. Combined with the characteristics of molecular storage structures, to improve retrieval efficiency of molecules, the continuous refractive index was discretized by fixed width algorithm to establish high-speed Hash index, and the distributed massive retrieval system was realized. The size of dataset was effectively reduced, and Hash collision was handled according to the visiting frequency. The experimental results show that, in the chemical data containing 200 thousand structures of molecules, the average time of this method is about five percent of the traditional general graph matching search. Besides, the model has the steady performance with high scalability. It is applicable to retrieve high-frequency molecules in accordance with refractive index under the environment of massive data.

Reference | Related Articles | Metrics

Select

Workflow task scheduling algorithm based on resource clustering in cloud computing environment

GUO Fenguu YU Long TIAN Shengwei YU Jiong SUN Hua

Journal of Computer Applications 2013, 33 (08): 2154-2157.

Abstract （856）

PDF （614KB）（543）

Save

Focusing on the characteristics of resource under large-scale, heterogeneous and dynamic environment in cloud computing, a workflow task scheduling algorithm based on resource fuzzy clustering was proposed. After quantizing and normalizing the resource characteristics, this algorithm integrated the theory of clustering to divide the resources based on the workflow task model and the resource model constructed in advance. The cluster with better synthetic performance was chosen firstly in scheduling stage. Therefore, it shortened the matching time between the task and the resource, and improved the scheduling performance. By comparing this algorithm with HEFT (Heterogeneous Earliest Finish Time) and DLS (Dynamic Level Scheduling), the experimental results show that the average SLR (Schedule Length Ratio) of this algorithm was smaller than that of HEFT by 3.4%, the DLS by 9.9%, and the average speedup of this algorithm was faster than that of HEFT by 59%, the DLS by 10.2% with the increase of tasks in a certain range of [0,100]; when the resources were increased in a certain range of [0,100], the average SLR of this algorithm was smaller than that of HEFT by 3.6%, the DLS by 9.7%, and the average speedup of this algorithm was faster than that of HEFT by 4.5%, the DLS by 10.8%. The results indicate that the proposed algorithm realizes the reasonable division of resources, and it surpasses HEFT and DLS algorithms in makespan.

Reference | Related Articles | Metrics

Select

Improved suffix tree clustering for Uyghur text

ZHAI Xian-min TIAN Sheng-wei YU Long FENG Guan-jun

Journal of Computer Applications 2012, 32 (04): 1078-1081. DOI: 10.3724/SP.J.1087.2012.01078

Abstract （1210）

PDF （600KB）（402）

Save

In order to solve the problems of non-standard, repetition and redundancy of information in the process of selecting the base class phrases, an improved Suffix Tree Clustering (STC) method was proposed. Firstly, phrase mutual information algorithm was put forward to choose the base class phrases abiding by Uyghur grammar. Secondly, in order to reduce the repeated base class phrase, the phrase reduction algorithm based on Uyghur grammar was proposed. Thirdly, on the basis of the first two steps, the phrase redundancy algorithm based on Uyghur grammar was constructed to remove redundant phrase. The experimental results show that this method improves the recall and the precision compared with STC. This indicates that the improved algorithm can enhance clustering performance effectively.

Reference | Related Articles | Metrics

Select

Automatic identification of Uyghur domain term in Web text

ZHONG Jun TIAN Sheng-wei YU Long

Journal of Computer Applications 2012, 32 (02): 407-410. DOI: 10.3724/SP.J.1087.2012.00407

Abstract （975）

PDF （599KB）（363）

Save

Since the Uyghur domain term is difficult to achieve, the workload of artificial expansion of the domain term is tremendous, and the efficiency is low, this paper used the Conditional Random Field (CRF) to identify the Uyghur domain term from the Web texts, which expanded the domain term with the conjunction word and the Mutual Information (MI) between the words based on the co-occurrence of terms. The experiments on the collected Web texts show that, for the short Uyghur domain terms, the algorithm achieves the precision as high as 97.59% and the recall 93.38%, and for the long Uyghur domain terms achieves the precision 55.72%.

Reference | Related Articles | Metrics